Analysis and Enhancement of Wikification for Microblogs with Context Expansion

نویسندگان

  • Taylor Cassidy
  • Heng Ji
  • Lev-Arie Ratinov
  • Arkaitz Zubiaga
  • Hongzhao Huang
چکیده

Disambiguation to Wikipedia (D2W) is the task of linking mentions of concepts in text to their corresponding Wikipedia entries. Most previous work has focused on linking terms in formal texts (e.g. newswire) to Wikipedia. Linking terms in short informal texts (e.g. tweets) is difficult for systems and humans alike as they lack a rich disambiguation context. We first evaluate an existing Twitter dataset as well as the D2W task in general. We then test the effects of two tweet context expansion methods, based on tweet authorship and topic-based clustering, on a state-of-the-art D2W system and evaluate the results. TITLE AND ABSTRACT IN BASQUE Testuinguruaren Hedapenaren Analisia eta Hobekuntza Mikroblogak Wikifikatzeko Esanahia Wikipediarekiko Argitzea (D2W) deritzo testuetan aurkitutako kontzeptuen aipamenak Wikipedian dagozkien sarrerei lotzeari. Aurreko lan gehienek testu formalak (newswire, esate baterako) lotu dituzte Wikipediarekin. Testu informalak (tweet-ak, esate baterako) lotzea, ordea, zaila da bai sistementzat eta baita gizakiontzat ere, argipena erraztuko luketen testuingururik ez dutelako. Lehenik eta behin, Twitter-en gainean sortutako datu-sorta bat, eta D2W ataza bera ebaluatzen ditugu. Ondoren, egungo D2W sistema baten gainean testuingurua hedatzeko bi teknika aztertu eta ebaluatzen ditugu. Bi teknika hauek tweet-aren egilean eta gaikako multzokatze metodo batean oinarritzen dira.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برنامه ریزی توسعه شبکه انتقال تحت شرایط بازار برق با در نظر گرفتن هزینه برقراری امنیت

An important factor to be considered in electric power system expansion planning is the security of service that the system is able to provide. In restructured power systems, variables such as agents’ profit or Locational Marginal Price (LMP) variances are considered in transmission expansion planning. To have a secure network this plan would be refined for simulated contingencies. This p...

متن کامل

Relational Inference for Wikification

Wikification, commonly referred to as Disambiguation to Wikipedia (D2W), is the task of identifying concepts and entities in text and disambiguating them into the most specific corresponding Wikipedia pages. Previous approaches to D2W focused on the use of local and global statistics over the given text, Wikipedia articles and its link structures, to evaluate context compatibility among a list ...

متن کامل

MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs

In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...

متن کامل

Comparing the Impact of Audio-Visual Input Enhancement on Collocation Learning in Traditional and Mobile Learning Contexts

: This study investigated the impact of audio-visual input enhancement teaching techniques on improving English as Foreign Language (EFL) learnersˈ collocation learning as well as their accuracy concerning collocation use in narrative writing. In addition, it compared the impact and efficiency of audio-visual input enhancement in two learning contexts, namely traditional and mo...

متن کامل

اندازه‌گیری ویژگی‌های مورفومتریک خندق‌های جنوب شرق ایران با پردازش رقومی تصاویر سنجنده ETM+

Dasht Yari plain is nearly 580,000 hectares which is under engraving gully erosion and unfortunately the gully development rate is increased in the recent decades. Satellite images may provide quick, extensive, and valuable information for the interpretation of morphometric characterstics of gully erosion expansion due to having attributes such as time series, relatively low cost, large coverag...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012